Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/slice intersect multi series #2592

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ymatzkevich
Copy link

@ymatzkevich ymatzkevich commented Nov 12, 2024

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2042.

Summary

The function TimeSeries.slice_intersect() (see documentation) allows to intersect a TimeSeries with another one so that they end up with the same time indices. However, if one wants to intersect multiple series, that function would need to be called several times or the intersection would need to be done by hand using e.g. xarray. The new function slice_intersect() introduced with this PR solves this issue for an arbitrary number of TimeSeries.

Essentially, given a list of TimeSeries having the same time index type, slice_intersect() will output the aligned list meaning that all TimeSeries in it will have the same start and end time (if the intersection exists).

Other Information

If the given TimeSeries do not have all the same time index type (e.g. some have a RangeIndex and some DateTimeIndex), the function will raise an error.

@ymatzkevich ymatzkevich force-pushed the feat/slice_intersect_multi_series branch from 0ae1729 to b6f6812 Compare November 12, 2024 15:48
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.10%. Comparing base (d909589) to head (3e5d5a0).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2592      +/-   ##
==========================================
- Coverage   94.14%   94.10%   -0.05%     
==========================================
  Files         139      139              
  Lines       14884    15003     +119     
==========================================
+ Hits        14013    14119     +106     
- Misses        871      884      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @ymatzkevich, it looks really good already 🚀
Just had some minor suggestions here and there. After that we can merge

darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/tests/test_timeseries.py Outdated Show resolved Hide resolved
darts/tests/test_timeseries.py Show resolved Hide resolved
@ymatzkevich
Copy link
Author

ymatzkevich commented Dec 13, 2024

In order to test the efficiency of the new logic, I performed a test (removed from the unit testing because of numerical costs) of intersecting a large number of TimeSeries. For example, something like:
sequence = [seriesA, seriesB]*500 and then int_sequence = slice_intersect(sequence)
This allowed to compare between the different options available to implement slice_intersect. While using xarray.align to perform this task proved to be slightly faster, it did not preserve mixed frequencies properly. The new logic is still scaling efficiently and passes all the unit tests. By first intersecting on the time indexes, we can compute the intersection without the need of creating an additional TimeSeries for each element of the given Sequence(TimeSeries).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Union function to find the intersection of time series
2 participants